I offered to present a simple example illustrating how the separation of citation elements from formatting templates might work for different styles and cultures. I don't think anyone really needs that now but I'll still post an illustration, though, in order to raise some important points about the interface between the BG representation of citations and the formatting templates. If we can agree on the "logical separation" of these two areas then we'll need to address how actual data values are passed through the interface to the formatting templates. By the "logical separation", I mean that the BG storage format need only be concerned with the representation of the citation meta-data and values, and the interface to a formatting-template system would be defined in a separate part of the BG standard. The physical implementation of the formatting-template system - of which there would be several - would then have to adhere to that part of the standard that prescribes the handling of our citation meta-data.
In this small illustration, I'm using XML but not mandating it. I just needed to pluck something out of the air to help convey the end-to-end concept. Although I mention CSL, I'm also not mandating that as anything more than one possible physical implementation of a formatting-template system.
Let's assume that the meta-data for source-types are defined in a central place. Each distinct source-type has some key associated with it - we'll use a URI here.
<Sources>
...
<Source>
<URI> uri </URI>
<Elements>
<Element Name='Author' Type='Text'/>
<Element Name='Title' Type='Text'/>
<Element Name='Publisher' Type='Text'/>
<Element Name='Place' Type='Text'/>
<Element Name='Year' Type='Integer'/>
<Element Name='Page' Type='Integer'/>
</Elements>
</Source>
...
</Sources>
Note that this only contains computer-readable meta-data. None of the names or tags should appear on a screen. In principle they could all be just xyz1, etc., but that wouldn't be recommended.
The associated data that is used on the screen in the User Interface is kept elsewhere, and keyed by the user's locale in addition the same source-type URI. This is standard practice in multinational software systems as the factoring-out of UI data greatly eases the application to multiple locales. This is a weakness in the currently documented CSL data model.
<UIData>
...
<Source Locale='en_US'>
<URI> uri </URI>
<Description> Basic Book Reference </Description>
<Elements>
<ElementTitle Name='Author'> Author (or Compiler) </ElementTitle>
<ElementTitle Name='Title'> Book Title </ElementTitle>
<ElementTitle Name='Publisher'> Publisher Name </ElementTitle>
<ElementTitle Name='Place'> Place of Publication </ElementTitle>
<ElementTitle Name='Year'> Year Published </ElementTitle>
<ElementTitle Name='Page'> Page Number </ElementTitle>
</Elements>
</Source>
...
</UIData>
This contains the source description, as it might appear say in a drop-down list, and the individual element titles, as they might appear in a UI form.
The formatting templates that generate the humanly-readable citation would be held in yet another place. These would also be keyed by the source-type URI and locale but also by the citation style. Each style would require a template for the type of reference to be generated (e.g. a source-list entry).
<Templates>
...
<Template Locale='en_US' Style='CMOS' Mode='SourceList'>
${Author,0}. <i>${Title,0}</i>. ${Place}: ${Publisher}, ${Year}.
</Template>
<Template Locale='en_US' Style='CMOS' Mode='FullNote'>
${Author,1}, <i>${Title,0}</i> (${Place}: ${Publisher}, ${Year}), ${Page}.
</Template>
<Template Locale='en_US' Style='CMOS' Mode='ShortNote'>
${Author,2}, <i>${Title,1}</i>, ${Page}.
</Template>
<Template Locale='en_US' Style='CMOS' Mode='InText'>
(${Author,2}, ${Year})
</Template>
...
</Templates>
These Mickey-Mouse examples use HTML formatting to get the visual attributes such as italics. The elements are inserted at the right place using the ${name} place-holders. Note that the example assumes that the formatting templates know how to generate different elided variations of names and titles using a simple integer suffix in the place-holder that represents the required operation. We'll come to this again in a moment.
Let's input some data for a citation now:
Author (or Compiler): Smyth, Constantine Joseph
Book Title: A Complete Abstract of the Statutes of Nebraska, with Legal Forms
Publisher Name: Rees Printing Co.
Place of Publication: Omaha
Year Published: 1905
Page Number: 123
Here are some examples of what might be generated then for various citation modes in your chosen style and locale:
Source List Entry
Smyth, Constantine Joseph. A Complete Abstract of the Statutes of Nebraska, with Legal Forms. Omaha: Rees Printing Co., 1905.
First (Full) Reference Note
Constantine Joseph Smyth, A Complete Abstract of the Statutes of Nebraska, with Legal Forms (Omaha: Rees Printing Co., 1905), 123.
Subsequent (Short) Note
Smyth, Complete Abstract of the Statutes of Nebraska, 123.
Parenthetical In-text Reference
(Smyth, 1905)
OK, so what are the issues to be raised here? Well, the simple element data-types in the meta-data are insufficient. Rather than just Text or Integer, the formatting templates need to have a little more semantic information such as the value being inserted is a personal name, or a place name, or a date, or a title, etc. Only then can it render dates according to the end-user's regional settings, or achieve the different levels of elided form.
Note that the meta-data is effectively defining the data-type required by the formatting template, and this isn't the same as the actual value being stored for a citation reference. In many software systems, this corresponds to the difference between 'formal parameters' and 'actual parameters'. Some level of coercion is required to convert the actual data to the required data. In the case of a personal name, the above illustration provides a textual representation of the author's name but it could easily be that of a relative, or some other individual, represented in the body of the BG data. Hence, the actual value could be text or a Person reference.
In the illustration, we saw a simple integer used (but not recommended) to achieve the diffent variations of the author's name, e.g.
${Author,0} - "Smyth, Constantine Joseph"
${Author,1} - "Constantine Joseph Smyth"
${Author,2} - "Smyth"
I believe these rules are built into CSL but that would be another weakness. In general, you cannot do this from the incoming text alone, and - as anyone reading the background material on Person Names will appreciate - any system that tries to identify forename/surname concepts is doomed to failure.
Ideally, the formatting template system needs help from the stored BG data. For instance, rather than a single canonical name being held for an individual, it should support specific elided forms that can be requested by a report-writer or a citation formatting-template. This is in addition to some indication of the relevant sorting rules. Both of these issues were on my list to address for STEMMA but were never completed.
There are related issues for Place names, and even the title of the cited source.
Tony
Comments
Just to be clear on the names thing: If you had multiple authors, you'd have multiple Authors would you? Or would the single Author be extended to cover them all? Or is it up to the designer to decide?
And then you're recommending, I believe that each Author is entered in 3(?) forms by the user rather than software attempting to shuffle the characters around? That would seem a sensible way of dealing with the complexities of the entirely fictional authors of "Bajoran Memories" by Ro Laren, Benjamin Sisko and Kira Nerys. Star Trek devotees will understand that Mesdames Ro and Kira have their names in the format family-name first already, so life would get complex trying to see which names would be inverted for the citation format with family-name first.
Tony
For instance, consider something like:
"General Sir Anthony Cecil Hogmanay Melchett VC DSO KCB"
...he of Blackadder fame :-)
We can break this down into name-parts like titles, givenname, middlenames, familyname, honaries, etc.
Although those terms are very Western, something more generic might allow us to specify sorting and different forms by the token groups.
For instance:
fore-titles = 1 2
given-name = 3
middle-names = 4 5
family-name = 6
post-titles = 7 8 9
Hence, different name forms might be
family-name, fore-name middle-initials
fore-name family-name
family-name, fore-initial
Similarly with a sort-order or collation-sequence.
If we can break apart a name then it is practical to have standard combinations for different "name locales" (i.e. the locale that the name structure is relevant to rather than specifically that of the user). This would simply the whole argument.
Tony
I assume you have had a look at the way names are handled in CSL. (It is interesting to note that it seems like Zotero does give you a way to enter the various parts specified in CSL.)
I don't think a solution that will require entering the various variants of a name will survive long, c.f. Adrian's comment.
Apart from that, I suggest we discuss the issues here as part of the Personal name discussion since I assume you will have the same requirements to sorting in e.g. an index of a report.
Here are some stylized bibliographic citations from WorldCat:
APA 6th
Charles, . (1989). _A vision of Britain: A personal view of architecture_. London: Doubleday.
Harvard 18th
CHARLES. (1989). _A vision of Britain: a personal view of architecture_. London, Doubleday.
MLA 7th
Charles, . _A Vision of Britain: A Personal View of Architecture_. London: Doubleday, 1989. Print.
Turabian 6th
Charles. _A Vision of Britain: A Personal View of Architecture_. London: Doubleday, 1989.